A Markov Chain Monte Carlo Expectation Maximization Algorithm for Statistical Analysis of DNA Sequence Evolution with Neighbor-Dependent Substitution Rates
نویسندگان
چکیده
The evolution of DNA sequences can be described by discrete state continuous time Markov processes on a phylogenetic tree. We consider neighbor-dependent evolutionary models where the instantaneous rate of substitution at a site depends on the states of the neighboring sites. Neighbor-dependent substitution models are analytically intractable and must be analyzed using either approximate or simulation-based methods. We describe statistical inference of neighbor-dependent models using a Markov chain Monte Carlo expectation maximization (MCMC-EM) algorithm. In the MCMC-EM algorithm, the high-dimensional integrals required in the EM algorithm are estimated using MCMC sampling. The MCMC sampler requires simulation of sample paths from a continuous time Markov process, conditional on the beginning and ending states and the paths of the neighboring sites. An exact path sampling algorithm is developed for this purpose.
منابع مشابه
Pseudo-Likelihood Analysis of Codon Substitution Models with Neighbor-Dependent Rates
Recently, Markov processes for the evolution of coding DNA with neighbor dependence in the instantaneous substitution rates have been considered. The neighbor dependency makes the models analytically intractable, and previously Markov chain Monte Carlo methods have been used for statistical inference. Using a pseudo-likelihood idea, we introduce in this paper an approximative estimation method ...
متن کامل3 Bayesian Methods in Biological Sequence Analysis
Hidden Markov models, the expectation–maximization algorithm, and the Gibbs sampler were introduced for biological sequence analysis in early 1990s. Since then the use of formal statistical models and inference procedures has revolutionized the field of computational biology. This chapter reviews the hidden Markov and related models, as well as their Bayesian inference procedures and algorithms...
متن کاملBayesian Methods in Biological Sequence Analysis
Hidden Markov models, the expectation–maximization algorithm, and the Gibbs sampler were introduced for biological sequence analysis in early 1990s. Since then the use of formal statistical models and inference procedures has revolutionized the field of computational biology. This chapter reviews the hidden Markov and related models, as well as their Bayesian inference procedures and algorithms...
متن کاملOle F . Christensen , Asger Hobolth and Jens L . Jensen : Pseudo - likelihood analysis of context - dependent codon substitution models
We consider Markov processes for coding DNA sequence evolution. In context dependent models the instantaneous substitution rate for a codon depends on the neighboring codons. This makes the model analytically intractable, and previously Markov chain Monte Carlo methods have been used for statistical inference. We introduce an approximative estimation method based on pseudo-likelihood that makes...
متن کاملAN EMPIRICAL BAYESIAN ANALYSIS OF SIMULTANEOUS CHANGEPOINTS IN MULTIPLE DATA SEQUENCES By
Motivated by applications in genomics, finance, and biomolecular simulation, we introduce a Bayesian framework for modeling changepoints that tend to co-occur across multiple related data sequences. We infer the locations and sequence memberships of changepoints in our hierarchical model by developing efficient Markov chain Monte Carlo sampling and posterior mode finding algorithms based on dyn...
متن کامل